Provenance Algebra and Materialized View-based Provenance Management
نویسندگان
چکیده
Provenance, from the French word „provenir‟ meaning "to come from", describes the lineage of an entity. Provenance is critical information in eScience to accurately interpret scientific results. Though information provenance has been recognized as a hard problem in computing science (British Computing Society, 2004), many fundamental research issues in provenance have yet to be addressed. A common provenance model with well-defined formal semantics to facilitate interoperability of provenance metadata from different sources has not been defined. Another important issue is the lack of a systematic study of provenance query characteristics across multiple applications. A classification or taxonomy of the provenance queries will not only help to better understand provenance metadata, but will also enable the definition of provenance query operators. Finally, while provenance for a user or an application is a specific view over all available provenance metadata, a provenance management system that supports provenance storage as views has not been implemented. In this paper we propose a novel provenance algebra consisting of a common provenance model called provenir, defined in description logic based W3C Web Ontology Language (OWLDL), along with a set of provenance query operators derived from the classification of provenance queries. We also introduce a practical provenance storage solution using materialized views over a generic relational database system. Our approach takes advantage of provenance query operators and well-defined indices to efficiently process complex provenance queries over very large datasets. To support our claims we present an evaluation of both performance and scalability aspects of our initial implementation. To the best of our knowledge this is the first provenance management system that supports the complete process from a formal provenance model and query operators to storage and efficient queries over provenance data.
منابع مشابه
Ontology-Driven Provenance Management in eScience: An Application in Parasite Research
Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in scientific applications to verify experiment process, validate data quality and associate trust values with scientific results. Current industrial scale eScience projects require an end-to-end provenance management infrastructure. This infrastructure needs to be ...
متن کاملPrOM: A Semantic Web Framework for Provenance Management in Science
The eScience paradigm is enabling researchers to collaborate over the Web in virtual laboratories and conduct experiments on an industrial scale. But, the inherent variability in the quality and trust associated with eScience resources necessitates the use of provenance information describing the origin of an entity. Existing systems often model provenance using ambiguous terminology, have poor...
متن کاملFine-Grained Provenance Inference for a Large Processing Chain with Non-materialized Intermediate Views
Many applications facilitate a data processing chain, i.e. a workflow, to process data. Results of intermediate processing steps may not be persistent since reproducing these results are not costly and these are hardly re-usable. However, in stream data processing where data arrives continuously, documenting fine-grained provenance explicitly for a processing chain to reproduce results is not a...
متن کاملFormal Foundations of Reenactment and Transaction Provenance
Provenance is essential for auditing, data debugging, understanding transformations, and many additional use cases. All these use cases would benefit from provenance for transactional updates. We present a provenance model for snapshot isolation transactions extending the semiring framework with version annotations and updates. Based on this model, we present the first solution for computing th...
متن کاملCollaborative Data Sharing with Mappings and Provenance
COLLABORATIVE DATA SHARING WITH MAPPINGS AND PROVENANCE Todd J. Green Supervisors: Zachary G. Ives and Val Tannen A key challenge in science today involves integrating data from databases managed by different collaborating scientists. In this dissertation, we develop the foundations and applications of collaborative data sharing systems (CDSSs), which address this challenge. A CDSS allows colla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008